The FG Programming Environment: Reducing Source Code Size for Parallel Programs Running on Clusters

نویسندگان

  • Elena Riccio Davidson
  • Thomas H. Cormen
چکیده

FG is a programming environment designed to reduce the source code size and complexity of out-of-core programs running on clusters. Our goals for FG are threefold: (1) make these programs smaller, (2) make them faster, and (3) reduce time-to-solution. In this paper, we focus on the first metric: the efficacy of FG for reducing source code size and complexity. We designed FG to fit programs, including high-end computing (HEC) applications, for which hiding latency is paramount to designing an efficient implementation. Specifically, we target out-of-core programs that fit into a pipeline framework. We use as benchmarks three outof-core implementations: bit-matrix-multiply/complement (BMMC) permutations, fast Fourier transform (FFT), and columnsort. FG reduces source code size by approximately 14–26% for these programs. Moreover, we believe that the code FG eliminates is the most difficult to write and debug.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FG: A Framework Generator for Hiding Latency in Parallel Programs Running on Clusters

FG is a programming environment for asynchronous programs that run on clusters and fit into a pipeline framework. It enables the programmer to write a series of synchronous functions and represents them as stages of an asynchronous pipeline. FG mitigates the high latency inherent in interprocessor communication and accessing the outer levels of the memory hierarchy. It overlaps separate pipelin...

متن کامل

Andes: a Performance Analyzer for Parallel Programs

ANDES is a performance monitor designed for MIMD distributed memory machines that inserts additional code in the program to be analyzed. ANDES determines the following metrics: speedup, eeciency, experimentally determined serial fraction, percentage of idle time per processor, load and communication balancing, synchronization time and percentage of cpu-communication overlapping. Within ANDES, a...

متن کامل

CUDA-For-Clusters: A System for Efficient Execution of CUDA Kernels on Multi-core Clusters

Rapid advancements in multi-core processor architectures along with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs to efficiently utilize all the resources in such a cluster is still a major challenge. Programmers have to manually deal with low-level details that should idea...

متن کامل

The Distributed Application Debugger

Developing parallel programs which run on distributed computer clusters introduces additional challenges to those present in traditional sequential programs. Debugging parallel programs requires not only inspecting the sequential code executing on each node but also tracking the flow of messages being passed between them in order to infer where the source of a bug actually lies. This thesis foc...

متن کامل

Parallel and Distributed Programming with Pthreads and Rthreads

This paper describes Rthreads (Remote threads), a software distributed shared memory system that supports sharing of global variables on clusters of computers with physically distributed memory. Other DSM systems either use virtual memory to implement coherence on networks of workstations or require programmers to adopt a special programming model. Rthreads uses primitives to read and write rem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005